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Community Detection and Classification in 
Hierarchical Stochastic Blockmodels 

Vince Lyzinski, Minh Tang, Avanti Athreya, Youngser Park, Carey E. Priebe 


Abstract —In disciplines as diverse as social network analysis and neuroscience, many large graphs are believed to be composed of 
loosely connected smaller graph primitives, whose structure is more amenable to analysis We propose a robust, scalable, integrated 
methodology for community detection and community comparison in graphs. In our procedure, we first embed a graph into an 
appropriate Euclidean space to obtain a low-dimensional representation, and then cluster the vertices into communities. We next 
employ nonparametric graph inference techniques to identify structural similarity among these communities. These two steps are 
then applied recursively on the communities, allowing us to detect more fine-grained structure. We describe a hierarchical stochastic 
blockmodei —namely, a stochastic blockmodel with a natural hierarchical structure—and establish conditions under which our algorithm 
yields consistent estimates of model parameters and motifs , which we define to be stochastically similar groups of subgraphs. Finally, 
we demonstrate the effectiveness of our algorithm in both simulated and real data. Specifically, we address the problem of locating 
similar sub-communities in a partially reconstructed Drosophila connectome and in the social network Friendster. 
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1 Introduction 

T He representation of data as graphs, with the vertices 
as entities and the edges as relationships between 
the entities, is now ubiquitous in many application do¬ 
mains: for example, social networks, in which vertices 
represent individual actors or organizations |jT|; neuro¬ 
science, in which vertices are neurons or brain regions 
0; and document analysis, in which vertices represent 
authors or documents J3J. This representation has proven 
invaluable in describing and modeling the intrinsic and 
complex structure that underlies these data. 

In understanding the structure of large, complex graphs, 
a central task is that of identifying and classifying lo¬ 
cal, lower-dimensional structure, and more specifically, 
consistently and scalably estimating subgraphs and sub¬ 
communities. In disciplines as diverse as social net¬ 
work analysis and neuroscience, many large graphs are 
believed to be composed of loosely connected smaller 
graph primitives, whose structure is more amenable to 
analysis. For example, the widely-studied social network 
FriendsteiJ^J which has approximately 60 million users 
and 2 billion edges, is believed to consist of over 1 
million communities at local-scale. Insomuch as the com¬ 
munication structure of these social communities both 
influences and is influenced by the function of the social 
community, we expect there to be repeated structure 
across many of these communities (see Section [5|. As 
a second motivating example, the neuroscientific cortical 
column conjecture |4j 0 posits that the neocortex of the 
human brain employs algorithms composed of repeated 
instances of a limited set of computing primitives. By 
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modeling certain portions of the cortex as a hierarchical 
random graph, the cortical column conjecture can be 
interpreted as a problem of community detection and 
classification within a graph. While the full data needed 
to test the cortical column conjecture is not yet available 
16] , it nonetheless motivates our present approach of 
theoretically-sound robust hierarchical community de¬ 
tection and community classification. 

Community detection for graphs is a well-established 
field of study, and there are many techniques and 
methodologies available, such as those based on max¬ 
imizing modularity and likelihood |7J [8j j9j, random 
walks Email, and spectral clustering and partitioning 
lH2l |T3l ITU [151 [16l [T7I . While many of these results 
focus on the consistency of the algorithms—namely, that 
the proportion of misclassified vertices goes to zero— 
the key results in this paper give guarantees on the 
probability of perfect clustering, in which no vertices at all 
are misclassified. As such, they are similar in spirit to the 
results of 0 and represent a considerable improvement 
of our earlier clustering results from [18j. As might be 
expected, though, the strength of our results depends 
on the average degree of the graph, which we require 
to grow at least at order y / nlog 2 (n). We note that weak 
or partial recovery results are available for much sparser 
regimes, e.g., when the average degree stays bounded as 
the number of vertices n increases (see, for example, the 
work of [19]). A partial summary of various consistency 
results and sparsity regimes in which they hold is given 
in Table |l] Existing theoretical results on clustering have 
also been centered primarily on isolating fine-grained 
community structure in a network. A major contribution 
of this work, then, is a formal delineation of hierarchi¬ 
cal structure in a network and a provably consistent 
algorithm to uncover communities and subgraphs at 
multiple scales. 

Moreover, existing community detection algorithms 
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Average degree 

Method 

Notion of recovery 

References 

0(1) 

semidefinite programming, backtracking random walks 

weak recovery 

(I9]|20)|2f| 

fi(log n) 

spectral clustering 

weak consistency 

|13]|14]|221 

fi(log n) 

modularity maximization 

strong consistency 

]7| 

Q(y/n log 2 n) 

spectral clustering 

strong consistency 

El 


TABLE 1: A summary of some existing results on the consistency of recovering the block assignments in stochastic 
blockmodel graphs with fixed number of blocks. Weak recovery and weak consistency correspond to the notions 
that, in the limit as n —> oo, the proportion of correctly classified vertices is non-zero and approaches one in 
probability, respectively. Strong consistency corresponds to the notion that the number of misclassified vertices is 
zero in the limit. 


have focused mostly on uncovering the subgraphs. Re¬ 
cently, however, the characterization and classification 
of these subgraphs into stochastically similar motifs 
has emerged as an important area of ongoing research. 
Network comparison is a nascent field, and compara¬ 
tively few techniques have thus far been proposed; see 
Il23ll24ll25ll26ll27ll28ll29l . In particular, in Il28l . the authors 
exhibit a consistent nonparametric test for the equality 
of two generating distributions for a pair of random 
graphs. The method is based on first embedding the net¬ 
works into Euclidean space followed by computing L 2 
distances between the density estimates of the resulting 
embeddings. This hypothesis test will play a central role 
in our present methodology; see Section [5] 

In the present paper, we introduce a robust, scalable 
methodology for community detection and community com¬ 
parison in graphs, with particular application to social 
networks and connectomics. Our techniques build upon 
previous work in graph embedding, parameter estima¬ 
tion, and multi-sample hypothesis testing (see lfl4Ul8l[28] 
29]). Our method proceeds as follows. First, we generate 
a low-dimensional representation of the graph (14], clus¬ 
ter to detect subgraphs of interest fll8l , and then employ 
the nonparametric inference techniques of l28l to iden¬ 
tify heterogeneous subgraph structures. The representa¬ 
tion of a network as a collection of points in Euclidean 
space allows for a single framework which combines the 
steps of community detection via an adapted spectral 
clustering procedure (Algorithm |2j with network com¬ 
parison via density estimation. Indeed, the streamlined 
clustering algorithm proposed in this paper. Algorithm 
[5] is well-suited to our hierarchical framework, whereas 
classical K'-means may be ill-suited to the pathologies 
of this model. As a consequence, we are able to present 
in this paper a unified inference procedure in which 
community detection, motif identification, and larger 
network comparison are all seamlessly integrated. 

We focus here on a hierarchical version of the classical 
stochastic block model |[30](3i;|, in which the larger graph 
is comprised of smaller subgraphs, each themselves ap¬ 
proximately stochastic blockmodels. We emphasize that 
our model and subsequent theory rely heavily on an 
affinity assumption at each level of the hierarchy, and we 
expect our model to be a reasonable surrogate for a wide 
range of real networks, as corroborated by our empirical 


results. In our approach, we aim to infer finer-grained 
structure at each level of our hierarchy, in effect per¬ 
forming a "top-down" decomposition. (For a different 
generative hierarchical model, in which successive-level 
blocks and memberships are the inference taks, see f32l .) 
We recall that the stochastic blockmodel (SBM) is an 
independent-edge random graph model that posits that 
the probability of connection between any two vertices 
is a function of the block memberships (i.e., community 
memberships) of the vertices. As such, the stochastic 
blockmodel is commonly used to model community 
structure in graphs. While we establish performance 
guarantees for this methodology in the setting of hierar¬ 
chical stochastic blockmodels (HSBM), we demonstrate 
the wider effectiveness of our algorithm for simulta¬ 
neous community detection and classification in the 
Drosophila connectome and the very-large scale social 
network Friendster, which has approximately 60 million 
users and 2 billion edges. 

We organize the paper as follows. In Section [2j we 
provide the key definitions in our model, specifically for 
random dot product graphs, SBM graphs, and HSBM 
graphs. We summarize recent results on networks com¬ 
parison from |28] , which is critical to our main algo¬ 
rithm, Algorithm [l We also present our novel clustering 
procedure. Algorithm 2] In Section |3j we demonstrate 
how, under mild model assumptions, Algorithm[l]can be 
applied to asymptotically almost surely perfectly recover 
the motif structure in a two-level HSBM, see Theorem 
[9] In Section |1J we consider a HSBM with multiple 
levels and discuss the recursive nature of Algorithm 
[T| We also extend Theorem [9] to the multi-level HSBM 
and show, under mild model assumptions. Algorithm 
[T] again asymptotically almost surely perfectly recovers 
the hierarchical motif structure in a multi-level HSBM. 
In Section [5j we demonstrate that Algorithm [l] can 
be effective in uncovering statistically similar subgraph 
structure in real data: first, in the Drosophila connec¬ 
tome, in which we uncover two repeated motifs; and 
second, in the Friendster social network, in which we 
decompose the massive network into 15 large subgraphs, 
each with hundreds of thousands to millions of vertices. 
We identify motifs among these Friendster subgraphs, 
and we compare two subgraphs belonging to different 
motifs. We further analyze a particular subgraph from 
a single motif and demonstrate that we can identify 












3 


structure at the second (lower) level. In Section | 6 j we 
conclude by remarking on refinements and extensions 
of this approach to community detection. 

2 Background 

We situate our approach in the context of hierarchi¬ 
cal stochastic blockmodel graphs. We first define the 
stochastic blockmodel as a special case of the more gen¬ 
eral random dot product graph model l33ll . which is itself 
a special case of the more general latent position random 
graph Il34ll . We next describe our canonical hierarchical 
stochastic blockmodel, which is a stochastic blockmodel 
that is endowed with a natural hierarchical structure. 

Notation: In what follows, for a matrix M £ R" x m . we 
shall use the notation to denote the i-th row of M, 

and to denote the i-th column of M. For a sym¬ 

metric matrix M £ R'" x 71 . we shall denote the (ordered) 
spectrum of M via Xi(M) > X 2 (M) > ■■■ > X n (M). 

We begin by defining the random dot product graph. 

Definition 1 (('/-dimensional Random Dot Product Graph 
(RDPG)). Let F be a distribution on a set X C R d 
such that (x,x') £ [0,1] for all x,x' £ X. We say 
that (X, A) ~ RDPG(-F) is an instance of a random 
dot product graph (RDPG) if X = [Xi,...,X n ] T with 

Xi,X 2 ,..., X n L ~' F, and A £ {0, 1}" X ” is a symmetric 
hollow matrix satisfying 

P[A|X] = II(X7X,)M1 - XjXj) 1 -^. 

i>j 

Remark 1. We note that non-identifiability is an intrinsic 
property of random dot product graphs. Indeed, for 
any matrix X and any orthogonal matrix W, the inner 
product between any rows i,j of X is identical to that 
between the rows i,j of XW. Hence, for any probability 
distribution F on X and unitary operator U, the adja¬ 
cency matrices A ~ RDPG(F) and B ~ RDPG(F o U) 
are identically distributed. 

We denote the second moment matrix for the vectors Xi 
by A = E(XiX^); we assume that A is rank d. 

The stochastic blockmodel can be framed in the context 
of random dot product graphs as follows. 

Definition 2. We say that an n vertex graph (X, A) ~ 
RDPG(H) is a (positive semidefinite) stochastic block- 
model (SBM) with K blocks if the distribution F is a 
mixture of I\ point masses, 

K 

F = ^7T(f)%, 

i =1 

where tt £ ( 0 , 1) K satisfies ]>R ij(i) = 1 , and the distinct 
latent positions are given by £ = [£i, £ 2 ,..., £ic] T £ 
R Kxd . In this case, we write G ~ SBAI(n,j f, ££ T ), and 
we refer to ££ T £ R K ’ K as the block probability matrix of G. 


Moreover, any stochastic blockmodel graphs where the 
block probability matrix B is positive semidefinite can 
be formulated as a random dot product graphs where 
the point masses are the rows of B 1//2 . 

Many real data networks exhibit hierarchical community 
structure (for social network examples, see [32, 35l l36ll371 
SUSSED; for biological examples, see ttS [5| [ 6 ] ). To incor¬ 
porate hierarchical structure into the above RDPG and 
SBM framework, we first consider SBM graphs endowed 
with the following specific hierarchical structure. 

Definition 3 (2-level Hierarchical stochastic blockmodel 
(HSBM)). We say that (X, A) ~ RDPG(H) is an instan¬ 
tiation of a D-dimensional 2-level hierarchical stochastic 
blockmodel with parameters (n, n, {Rf ■ £) if F can be 
written as the mixture 

R 

F = 5Z 7r (*) F u 

i =1 

where n £ ( 0 , l) fi satisfies 7r (*) = 1 / and for each 
i £ [Ji], Fi is itself a mixture of point mass distributions 

Ki 

F i = 

3=1 

where if, £ (0,1)^ satisfies Yj n iU) = !• The distinct 

latent positions are then given by £ = [£p’| • • • |£^] T £ 
Ki) xD ' yy e then write 

G ~ HSBM(n, if, (R,}f =1 ,^ T ). 

Simply stated, an HSBM graph is an RDPG for which the 
vertex set can be partitioned into R subgraphs—where 
(^j' 2> ) T £ R KiXD denotes the matrix whose rows are 
the latent positions characterizing the block probability 
matrix for subgraph i —each of which is itself an SBM. 

Throughout this manuscript, we will make a number 
of simplifying assumptions on the underlying HSBM 
in order to facilitate theoretical developments and ease 
exposition. 

Assumption 1. (Affinity HSBM) We further assume that 
for the distinct latent positions, if we define 

q := min(^ 2} (:,/),^ 2) (:,/i)); (1) 

i,ri,l 

p := maxma x(^ 2) (:,£),^ 2) (:,h)) (2) 

then p < q. Simply stated, we require that within each 
subgraph, the connections are comparatively dense, and 
between two subgraphs, comparatively sparse. 
Assumption 2. (Subspace structure) To simplify expo¬ 
sition, and to assure the condition that (£,•(:, 

, h)) < p for 1 < i 3 ^ j < R and £,h £ [K], we impose 
additional structure on the matrix of latent positions in 
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the HSBM. To wit, we write £ £ R RKx n explicitly as 


[«i 2, ) T l 



0 • 

• 01 

«R) T 

= 

• 01 

ej ■ 

• 0 

.( 4 2, ) T 


1 

0 • • 

0 • 

</‘T> 

' ' 

1_ 


zm 


where ZA 2> o A i2> = 0 (o being the Hadamard product) 
and the entries of A 1 ' 2 ' 1 are chosen to make the off block- 
diagonal elements of the corresponding edge probability 
matrix fR 7 bounded above by the absolute constant p. 
Moreover, to ease exposition in this 2-level setting, we 
will assume that for each i £ [i?], so that 

D = Rd. In practice, the subspaces pertaining to the 
individual subgraphs need not be the same rank, and 
the subgraphs need not have the same number of blocks 
(see Section [5] for examples of K and d varying across 
subgraphs). 

Remark 2 . Note that G ~ HSBM(n,if, ££ T ) can 

be viewed as a SBM graph with Yl, -EQ blocks; G ~ 
SBM(n, ( 74 ( 1 ) 7 ?!, 7r(2)7r 2 ,..., n(R)TT R ), ££ T ). However, in 
this paper we will consider blockmodels with statisti¬ 
cally similar subgraphs across blocks, and in general, 
such models can be parameterized by far fewer than 
Y2, Ki blocks. In contrast, when the graph is viewed as 
an RDPG, the full [>A EQ dimensions may be needed, 
because our affinity assumption necessitates a growing 
number of dimensions to accommodate the potentially 
growing number of subgraphs. Because latent positions 
associated to vertices in different subgraphs must exhibit 
near orthogonality, teasing out the maximum possible 
number of subgraphs for a given embedding dimension 
is, in essence, a cone-packing problem; while undoubt¬ 
edly of interest, we do not pursue this problem further 
in this manuscript. 

Given a graph from this model satisfying Assumptions 
1 and 2, we use Algorithm [T] to uncover the hidden 
hierarchical structure. Furthermore, we note that Algo¬ 
rithm [l] can be applied to uncover hierarchical structure 
in any hierarchical network, regardless of HSBM model 
assumptions. However, our theoretical contributions are 
proven under HSBM model assumptions. 

A key component of this algorithm is the computation 
of the adjacency spectral embedding [14j, defined as 
follows. 

Definition 4. Given an adjacency matrix A £ {0, l} rax ” 
of a d-dimensional RDPG(F), the adjacency spectral embed¬ 
ding (ASE) of A into is given by X = UaS where 

\A\ = [U a \Ua}[Sa® S A \[U A \U A ] 

is the spectral decomposition of |A| = (A T A) 1 / 2 , S A is 
the diagonal matrix with the (ordered) d largest eigen¬ 
values of A on its diagonal, and E /4 £ R " x7 is the ma¬ 
trix whose columns are the corresponding orthonormal 
eigenvectors of \A\. 


Algorithm 1 Detecting hierarchical structure for graphs 

Input: Adjacency matrix A £ {0, 1 }" xrl for a latent 
position random graph. 

Output: Subgraphs and characterization of their dis¬ 
similarity 

while Cluster size exceeds threshold do 

Step 1: Compute the adjacency spectral embedding 
(see Definition |4jy X of A into R D ; 

Step 2: Cluster X to obtain subgraphs Hi,-- - , IIh 
using the procedure described in Algorithm [2] 

Step 3: For each i £ [R], compute the adjacency 
spectral embedding for each subgraph Hi into R d , 
obtaining Xq ; 

Step 4: Compute S := \Tf lr ^ a (Xp , X ^ )], where T is 
the test statistic in Theorem [ 8 j producing a pairwise 
dissimilarity matrix on induced subgraphs; 

Step 5: Cluster induced subgraphs into motifs using 
the dissimilarities given in S; e.g., use a hierarchical 
clustering algorithm to cluster the rows of S or the 
matrix of associated p-values. 

Step 6: Recurse on a representative subgraph from 
each motif (e.g., the largest subgraph), embedding 
into M. d in Step 1 (not K 75 ); 

end while 


It is proved in [14, 41 j that the adjacency spectral em¬ 
bedding provides a consistent estimate of the true latent 
positions in random dot product graphs. The key to this 
result is a tight concentration, in Frobenius norm, of the 
adjacency spectral embedding, X, about the true latent 
positions X. This bound is strengthened in (18J, wherein 
the authors show tight concentration, in 2 i-> 00 norm, 
of X about X. The 2 1 —>• 00 concentration provides a sig¬ 
nificant improvement over results that employ bounds 
on the Frobenius norm of the residuals between the esti¬ 
mated and true latent positions, namely ||X — X||f. The 
Frobenius norm bounds are potentially sub-optimal for 
subsequent inference, because one cannot rule out that 
a diminishing but positive proportion of the embedded 
points contribute disproportionately to the global error. 

However, the 2 £ 00 norm concentration result in 
m relies on the assumption that the eigenvalues of 
EfXiXj 7 ’] are distinct , which is often violated in the 
setting of repeated motifs for an HSBM. One of the main 
contributions of this paper is a further strengthening of 
the results of fl8l : in Theorem [ 5 ] we prove that X con¬ 
centrates about X in 2 1 —> 00 norm with far less restrictive 
assumptions on the eigenstructure of E\X] X[\. 

In this paper, if E n is a sequence of events, we say 
that E n occurs asymptotically almost surely if P(E n ) —t 1 
as b — > 00 ; more precisely, we say that E n occurs 
asymptotically almost surely if for any fixed c > 0 , 
there exists n 0 (c) such that if n > n 0 (c) and 77 satisfies 
n~ c < 77 < 1/2, then P{E n ) is at least 1 — 77 . The 
theorem below asserts that the 2 £ 00 norm of the 
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differences between true and estimated latent positions 
is of a certain order asymptotically almost surely. In the 
appendix, we state and prove a generalization of this 
result in the non-dense regime. 

Theorem 5. Let (A,X) ~ RDPG(T 1 ) where the second 
moment matrix A = E(X 1 Xf) is of rank d. Let E n he the 
event that there exists a rotation matrix W such that 

Cd 1 / 2 i np .2 

WX-XWW^ = max || X(i, :)-WX(i, :)|| < ° S 

* V n 

where C is some fixed constant. Then E n occurs asymptoti¬ 
cally almost surely. 

We stress that because of this bound on the 2 —> oo norm, 
we have far greater control of the errors in individual 
rows of the residuals X — X than possible with existing 
Frobenius norm bounds. One consequence of this control 
is that an asymptotically perfect clustering procedure for 
X will yield an equivalent asymptotically almost surely 
perfect clustering of X. This insight is the key to proving 
Lemma | 6 j see the appendix for full detail. A further 
consequence of Theorem [5j in the setting of random dot 
product graphs without a canonical block structure, is 
that one can choose a loss function with respect to which 
ASE followed by a suitable clustering yields optimal 
clusters MM- This implies that meaningful clustering 
can be pursued even when no canonical hierarchical 
structure exists. 

Having successfully embedded the graph G into M. D 
through the adjacency spectral embedding, we next clus¬ 
ter the vertices of G, i.e., rows of X. For each i £ [/?], we 
define 

(£f } ) T e xc 

to be the matrix whose rows are the rows in X corre¬ 
sponding to the latent positions in the rows of (^ 2 ' I ) T . 
Our clustering algorithm proceeds as follows. With As¬ 
sumptions 1 and 2, and further assuming that R is 
known, we first build a "seed" set S n as follows. Ini¬ 
tialize So to be a random sampling of R rows of X. For 
each i £ [n ], let y, z £ Si -1 be such that 

max (y,z) = {y,z). 
y,zeSi -1 

If max x£ 5 i _ 1 (A(f, :),x) < ( y,z ), then add X(i,:) to Si- 1 , 

and remove 5 from Si- 1 ; i.e., 

5 i = (5 i _ 1 \{5})U{X(i,:)}. 

If max 3 .g 5 ._j (X(i, :), x) > ( y,z ), then set 1 . 

Iterate this procedure until all n rows of X have been 
considered. We show in Proposition [T9] in the appendix 
that S n is composed of exactly one row from each 
£b). Given the seed set S n = {si,s 2 ,'-- , s^}, we then 
initialize R clusters C'i. C%, ■ ■ ■ , Cr via G, = {.s,} for each 
i £ [i?]. Lastly, for i £ [n], assign X(i ,:) to Cj if 

arg max(A'(i,:), s) = Sj. 


Algorithm 2 Seeded nearest neighbor subspace clus¬ 
tering 

Initialize So to be a random sampling of R rows of X. 

for all i £ [n] do 

Let y, z £ Si -1 be such that (y, z) = max^^.^ (y, z) 

if max x g St-dXiffiX) < {y,z ) then 
Si = (Si-i \ {5}) U {X(i ,:)} 

end if 
end for 

Denote S n = {.s-, .... ,j R } 

Initialize R clusters C\ = {* 1 },.... C R = {s R } 
for all i £ [n] do 

Let t (i) = argmax Jgi j (X(i, :), Sj ) 

Cf (i) = CV(j) U {X(i ,:)} 

end for 


As encapsulated in the next lemma, this procedure, 
summarized in Algorithm [2j yields an asymptotically 
perfect clustering of the rows of X for HSBM's under 
mild model assumptions. 

Lemma 6. Let G ~ HSBM(n, 7 r, { 7 fj}fL 1 ,££ T ) satisfying 
Assumptions 1 and 2, and suppose further that 7 r m i n := 
min, : 7 r(z) > 0. Then asymptotically almost surely, 

min X! 00 ^ cr ( r (*))} = 0, 

(TfcOi? 

I 

where r : [n] —t [R] is the true assignment of vertices to 
subgraphs, and t is the assignment given by our clustering 
procedure above. 

Under only our "affinity assumption"—namely that q > 
p —fc-means cannot provide a provably perfect clustering 
of vertices. This is a consequence of the fact that the num¬ 
ber of clusters we seek is far less than the total number of 
distinct latent positions. As a notional example, consider 
a graph with two subgraphs, each of which is an SBM 
with two blocks. The representation of such a graph in 
terms of its latent positions is illustrated in Figure [l] We 
are interested in clustering the vertices into subgraphs, 
i.e., we want to assign the points to their corresponding 
cones (depicted via the shaded light blue and pink 
areas). If we denote by ni, 1 r 2 , and 7 r 3 the fraction of red, 
green, and blue colored points, respectively, then a fc- 
means clustering of the colored points into two clusters 
might, depending on the distance between the points 
and 771 , 772 , 773 , yield two clusters with cluster centroids 
inside the same cone - thereby assigning vertices from 
different subgraphs to the same cluster. That is to say, if 
the subgraphs' sizes in Figure [l] are sufficiently unbal¬ 
anced, then fc-means clustering could yield a clustering 
in which the yellow, green, and blue colored points are 
assigned to one cluster, and the red colored points are 
assigned to another cluster. In short, fc-means is not a 
subspace clustering algorithm, and the subspace and 
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Fig. 1: Subgraphs vs. clustering: Note that if the fraction 
of points in the pink cone is sufficiently large, iT-means 
clustering (with K = 2) will not cluster the vertices into 
the canonical subgraphs. 


affinity assumptions made in our HSBM formulation 
(Assumptions 1, 2, 3 and 4) render A:-means suboptimal 
for uncovering the subgraph structure in our model. 
Understanding the structure uncovered by fc-means in 
our HSBM setting, while of interest, is beyond the scope 
of this manuscript. 

Note that p being small ensures that the subgraphs of 
interest, namely the if/s, lie in nearly orthogonal sub¬ 
spaces of M ,:> . Our clustering procedure is thus similar 
in spirit to the subspace clustering procedure of 1421 . 


Remark 3. In what follows, we will assume that R, 
the number of induced SBM subgraphs in G, and D 
are known a priori. In practice, however, we often 
need to estimate both D (prior to embedding) and R 
(prior to clustering). To estimate D, we can use singular 
value thresholding l43l to estimate D from a partial 
SCREE plot. While we can estimate R via traditional 
techniques—i.e., measuring the validity of the clustering 
provided by Algorithm[2]over a range of R via silhouette 
width (see l44l Chapter 3])—we propose an alternate 
estimation procedure tuned to our algorithm. For each 
k = 2,3,..., D, we run Algorithm [ 2 ] with R = k, and 
repeat this procedure umc times. For each k, and each 
i = 1 , 2 ,..., umc compute 


and compute 



max ( s , t), 




1 

kmc 


riMC 

£ 4 ‘>. 


i=l 


Subgraphs found via clustering 



Fig. 2: Depiction of the adjacency matrix of a two- 
level HSBM graph with 3 distinct motifs. In the above 
4100x4100 grid, if an edge exists in G between vertices 
i and j, the the corresponding i,j -th cell in the grid 
is black. The cell is white if no edge is present. The 
subgraphs corresponding to motifs are H i, H.\, and ff 8 ; 
H 2 , and if 7 ; and // 3 , ff 5 , and H e . 


than the true R, then at least two of the vectors in S n 
would lie in the same subspace; i.e., their dot product 
would be large. Hence, we would expect the associated 
0 <k) to be large. We employ standard "elbow-finding" 
methodologies l45l to find the value of k for which <f k> 
goes from small to large, and this k will be our estimate 
of R. As Algorithm [2]has running time linear in n, with a 
bounded number of Monte Carlo iterates, this estimation 
procedure also has running time linear in n. 

Post-clustering, a further question of interest is to deter¬ 
mine which of those induced subgraphs are structurally 
similar. We define a motif as a collection of distribu- 
tionally "equivalent"—in a sense that we will make 
precise in Definition [7] —RDPG graphs. An example of 
a HSBM graph with 8 blocks in 3 motifs is presented in 
Figure 2] More precisely, we define a motif- —namely, an 
equivalence class of random graphs—as follows. 

Definition 7. Let (A,X) ~ RDPG(F) and (B,Y) ~ 
RDPG(G). We say that A and B are of the same motif 
if there exists a unitary transformation U such that 
F = GoU. 


To detect the presence of motifs among the induced 
subgraphs {Hi,..., Hr}, we adopt the nonparametric 
test procedure of li28l to determine whether two RDPG 
graphs have the same underlying distribution. The prin¬ 
cipal result of that work is the following: 


If the true R is greater than or equal to k, then we Theorem 8 . Let {A,X) ~ RDPG(F) and ( B,Y) ~ 
expect to be small by construction. If k is bigger RDPG(G) be d-dimensional random dot product graphs. 
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Consider the hypothesis test 


H 0 : F = G o U against H A : F f G o U. 


Denote by X = {Xl, ..., X n } and Y = {Yi,..., Y m } the 
adjacency spectral embedding of A and B, respectively. Define 
the test statistic T n ^ m = T n<m (X,Y) as follows: 


>(X,Y) 


1 

n(n — 1) 




2 

mn 


£5>(X j ,r fc ) + 

i —1 k —1 


l 

m(m — 1) 


Y^.Y,) (3) 

l^k 


where k is a radial basis kernel, e.g., k = exp(—1| — || 2 /cr 2 ). 
Suppose that m, n — > oo and to/(to + n) —>■ p £ (0,1). Then 
under the null hypothesis of F = G oU, 

\T n ,m{X, Y) — T n>m (X, YW)\ ^ 0 (4) 


and \T n ^ m {X,YW)\ —> 0 as n,m —> oo, where W is any 
orthogonal matrix such that F = GoW. In addition, under the 
alternative hypothesis ofFf= GoU, there exists an orthogonal 
matrix W £ R. dxd , depending on F and G but independent 
of to and n, such that 

| T n>m (X,Y) - T n>m (X,YW) | 0 , (5) 

and | T n , m (X, YW)| — > c > 0 as n, m. — > oo. 

Theorem |8] allows us to formulate the problem of de¬ 
tecting when two graphs A and B belong to the same 
motif as a hypothesis test. Furthermore, under appro¬ 
priate conditions on n (conditions satisfied when k is 
a Gaussian kernel with bandwidth a 2 for fixed a), the 
hypothesis test is consistent for any two arbitrary but 
fixed distributions F and G, i.e., T ntm (X,Y) —» 0 as 
n,m — t oo if and only if F = G. We are presently 
working to extend results on the consistency of adja¬ 
cency spectral embedding and two-sample hypothesis 
testing (i.e.. Theorem 8]and 12911 1 from the current setting 
of random dot product graphs to more general ran¬ 
dom graph models, with particular attention to scale- 
free and small-world graphs. However, the extension 
of these techniques to more general random graphs is 
beset by intrinsic difficulties. For example, even extend¬ 
ing motif detection to general latent position random 
graphs is confounded by the non-identifiability inherent 
to graphon estimation. Complicating matters further, 
there are few random graph models that are known 
to admit parsimonious sufficient statistics suitable for 
subsequent classical estimation procedures. 


3 Detecting hierarchical structure in 
the HSBM 

Combining the above inference procedures, our algo¬ 
rithm, as depicted in Algorithm [lj proceeds as follows. 
We first cluster the adjacency spectral embedding of 
the graph G to obtain the first-order, large-scale block 
memberships. We then employ the nonparametric test 



Fig. 3: Heatmap depicting the dissimilarity matrix S 
produced by Algorithm [l] for the 2-level HSBM depicted 
in Figure [ 2 ] We apply hierarchical clustering to S (with 
the resulting dendrogram clustering displayed) demon¬ 
strating the which recover the three distinct motifs. 


procedure outlined in [28] to determine similar induced 
subgraphs (motifs) associated with these blocks. We iter¬ 
ate this process to obtain increasingly refined estimates 
of the overall graph structure. In Step 6 of Algorithm [l] 
we recurse on a representative subgraph (e.g., the largest 
subgraph) within each motif; embedding the subgraph 
into (not R D ) as Step 1 of Algorithm [l] Ideally, 
we would leverage the full collection of subgraphs 
from each motif in this recursion step. However, the 
subgraphs within a motif may be of differing orders 
and meaningfully averaging or aligning them (see [46j) 
requires novel regularization which, though interesting, 
is beyond the scope of the present manuscript. 

Before presenting our main theorem in the 2-level 
setting. Theorem [9j we illustrate the steps of our 
method in the analysis of the 2-level synthetic HSBM 
graph depicted in Figure [2] The graph has 4100 ver¬ 
tices belonging to 8 different blocks of size n = 
(300,600,600,600,700,600,300,400) with three distinct 
motifs. The block probability matrices corresponding to 
these motifs are given by 



'0.3 

0.25 

0.25' 


' 0.4 

0.25 

0.25' 

b y = 

0.25 

0.3 

0.25 

; b 2 = 

0.25 

0.4 

0.25 


0.25 

0.25 

0.7 


0.25 

0.25 

0.4 



'0.25 

0.2 

0.2' 

b 3 = 

0.2 

0.8 

0.2 


0.2 

0.2 

0.25 



























and the inter-block edge probability is bounded by p = 

0 . 01 . 


The algorithm does indeed detect three motifs, as de¬ 
picted in Figure [3] The figure presents a heat map 
depiction of S, and the similarity of the communities 
is represented on the spectrum between white and red, 
with white representing highly similar communities and 
red representing highly dissimilar communities. From 
the figure, we correctly see there are three distinct motif 
communities, {H 3 ,H r }, {//, . H 2 , H s }, and {f/ 4 . //,,}, 

corresponding to stochastic blockmodels with the fol¬ 
lowing block probability matrices 


Bi 


0.27 0.25 
0.25 0.72 ’ 


B 3 = 


0.41 

0.27 

0.26 

0.27 

0.40 

0.25 

0.26 

0.25 

0.41 


0.22 0.20 

0.20 0.80 


We note that even though the vertices in the HSBM are 
perfectly clustered into the subgraphs (i.e., for i £ [ 8 ], 
Hi = Hi for all i), the actual B's differ slightly from 
their estimates, but this difference is quite small. 

The performance of Algorithm [l] in this simulation set¬ 
ting can be seen as a consequence of Theorem [9] below, 
in which we prove that under modest assumptions 
on an underlying 2 -level hierarchical stochastic block 
model. Algorithm [2] yields a consistent estimate of the 
dissimilarity matrix S := [T n . „ (H,. Hj)}. 


Theorem 9. Suppose G is a hierarchical stochastic blockmodel 
satisfying Assumptions 1 and 2. Suppose that R is fixed 
and the {H r } correspond to M different motifs, i.e., the set 
{£n £ 2 : • • •, £r} has M < R distinct elements. Given the 
assumptions of Theorem^and Lemma [ 6 ] the procedure in Al- 
gorithm [I ] yields perfect estimates Hi = Hi, ■ ■ ■ , Hu = Hu 
of Hi, - ■ ■ , Hu and S of S asymptotically almost surely. 


Proof: By Lemma [ 6 j the clustering provided by Step 
2 of Algorithm [l] will be perfect asymptotically almost 
surely. Given this, H\ = H\, - ■ ■ , Hu = Hu are consistent 
estimates of Hi, - ■ , Hu- Theorem [ 8 ] then implies that 
S yields a consistent estimate of S; i.e.; for each i.j, 
|5(f, j) — S(i, j)\ — 1 ► 0 as n — 1 ► 00 . □ 


With assumptions as in Theorem |9j any level 7 test using 
Sij corresponds to an at most level 7 + 2// test using S, :l . 
In this case, asymptotically almost surely, the p-values 
of entries of S corresponding to different motifs will 
all converge to 0 as n 7 r min —> 00 , and the p-values of 
entries of S corresponding to the same motifs will all be 
bounded away from 0 as r/ 7 r ltliri —> 00 . This immediately 
leads to the following corollary. 

Corollary 10. With assumptions as in Theorem [9j clustering 
the matrix of p-values associated with S yields a consistent 
clustering of {Hj}f =l into motifs. 


Theorem [9] provides a proof of concept inference result 
for our algorithm for graphs with simple hierarchical 
structure, and we will next extend our setting and theory 
to a more complex hierarchical setting. 


4 Multilevel HSBM 

In many real data applications (see for example. Section 
[5}, the hierarchical structure of the graph extends be¬ 
yond two levels. We now extend the HSBM model of 
Definition [3] —which, for ease of exposition, was initially 
presented in the 2 -level hierarchical setting—to incorpo¬ 
rate more general hierarchical structure. With the HSBM 
of Definition [3] being a 2-level HSBM (or 2-HSBM), we 
inductively define an f-levcl HSBM (or i-HSBM ) for 
/ £ Z > 3 as follows. 

Definition 11 (/-level Hierarchical stochastic blockmodel 
GHSBM). We say that (X,A) ~ RDPG(F^) is an 
instantiation of a -dimensional /-level HSBM if the 
distribution can be written as 

rW 

rW(t)if), ( 6 ) 

i =1 

where 

i. 7£ ( 0 , 1 ) kW with ^ TrW(i) = 1 ; 

ii. has support on the rows of | • l£^(o] T > 

where for each i £ F-‘ 1 has support on the 

rows of (£-^) T . 

iii. For each i £ \R^], an RDPG graph drawn according 
to (Y, B ) ~RDPG(/’’/ / ) ) is an //-level HSBM with h < 
/ — 1 with at least one such h = t — 1 . 

Simply stated, an /-level HSBM graph is an RDPG (in 
fact, it is an SBM with potentially many more than R {t) 
blocks) for which the vertex set can be partitioned into 
RfY subgraphs—where (Ci^) T £ R K ’ denotes the 

matrix whose rows are the latent positions characterizing 
the block probability matrix for subgraph i —each of 
which is itself an /i-level HSBM with h < l — 1 with 
at least one such h = / — 1 . 

As in the 2-level case, to ease notation and facilitate 
theoretical developments, in this paper we will make 
the following assumptions on the more general /-level 
HSBM. Letting {X,A) ~ H DPGf/’W) be an instantiation 
of a -dimensional /-level HSBM, we further assume: 
Assumption 3: (Multilevel affinity) For each k £ 
{2,3,..., /} the constants 

:=mm(g\:,j),$ k \-.,h)), (7) 

p (fe) := rnaxmax k \:, h), m)) ( 8 ) 

izfij rn,h J 

satisfy p( k i < q( k \ 

Assumption 4: (Subspace structure) For each i £ 
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Fig. 4: Notional depiction of a general hierarchical graph structure. The colored nodes in the first and second level 
of the tree (below the root node) correspond to induced subgraphs and associated motifs. 


F^ has support on the rows of (£^) T , which collec¬ 
tively satisfy 


1 

''S 


O 

7 

<3 

1 

o 


— 

o (ir'y ■ 

• o 

.(($r. 


0 0 

Gflto > J 


z(0 


+A (t \ 


( 9 ) 


where Z^') o = 0 (o again being the Hadamard 
product). For each i £ [7i (7 ' , |, an RDPG graph drawn 
according to ( Y,B ) ^RDPG(i 7 '/ £ 1 "’) where F-' 1: has 
support on the rows of (£^ _1 £ R^* >xD i ’ ) T and is 
an at most £ — 1-level FISBM (with at least one subgraph 
being an (£ — l)-level FISBM). In addition, we assume 
similar subspace structure recursively at every level of 
the hierarchy. 

Remark 4. As was the case with Assumption 2, As¬ 
sumption 4 plays a crucial role in our algorithmic de¬ 
velopment. Indeed, under this assumption we can view 
successive levels of the hierarchical FISBM as RDPG's 
in successively smaller dimensions. Indeed, it is these 
(Y,B) ~RDPG(F’ ) ' / l ] ) which we embed in Step 3 of 
Algorithm [T] and we embed them into the smaller 
R /} > rather than R D . For example, suppose G is an 
/:-levcl HSBM, and G has R subgraphs each of which is 
an [l — l)-level FISBM. Furthermore suppose that each 
of these subgraphs itself has R subgraphs each of which 
is an {£ — 2)-level FISBM, and so on. If the SBM's at the 
lowest level are all d-dimensional, then G can be viewed 
as an R e ~ x d-dimensional RDPG. In practice, to avoid this 
curse of dimensionality, we could embed each subgraph 
at level k < £ into R dimensions and still practically 
uncover the subgraph (but not the motif!) structure. This 
assumption also reinforces the affinity structure of the 
subgraphs, which is a key component of our theoretical 
developments. 

In the 2-level FISBM setting, we can provide theoret¬ 


ical results on the consistency of our motif detection 
procedure. Algorithm |T] As it happens, in this simpler 
setting, the algorithm terminates after Step 6; that is, 
after clustering the induced subgraphs into motifs. There 
is no further recursion on these motifs. We next extend 
Theorem [9] to the multi-level FISBM setting as follows. 
In the following theorem, for an RDPG G = ( X , A), let 
Xg be the ASE of G and let Xq = X be the true latent 
positions of G; i.e., E(A) = XX T . 

Theorem 12. With notation as above, let (X, A) ~ 
RDPG(.F^) be an instantiation of a -dimensional, £- 
level HSBM with £ fixed. Given Assumptions 3 and 4 ,further 
suppose that for each k £ {2,3,..., t} every k-level HSBM 
subgraph, G, of ( X , A) satisfies 

i. the number of components in the mixture, Eq. [6] /or 
G, which we shall denote by R G , is known and fixed 
with respect to n, and {H G }f =1 are these R G subgraphs 
of G; additionally, the R G mixture coefficients 
(associated with Eq. [6j for G) are all strictly greater than 
0 ; 

ii. if G is D g dimensional, then these R G subgraphs — 
when viewed as the subgraphs corresponding the diagonal 
blocks of Z G of Eq. to be embedded into < D G 
dimensions as in Remor/c ]!]— correspond to M G different 
motifs with M G fixed with respect to n. 

It follows then that for all such G, the procedure in Algo¬ 
rithm [I] simidtaneously yields perfect estimates H G = H G , 
H g = Hf , • • • , H g g = H g g of {H G }^ asymptotically 
almost surely. It follozvs then that for for each such G, 
S G = [T(XgG,Xfta)] yield consistent estimates of S G = 
[T(X h g,X h g)\, zvhich allozvs for the asymptotically almost 
surely perfect detection of the M G motifs. 

We note here that in Theorem [l2j £ and the total number 
of subgraphs at each level of the hierarchy are fixed with 
respect to n. As n increases, the size of each subgraph at 
each level is also increasing (linearly in n), and therefore 
any separation between /f kl and r/R at level k will be 
sufficient to perfectly separate the subgraphs asymptoti- 
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cally almost surely. The proof of the above theorem then 
follows immediately from Theorem [9] and induction on 
t, and so is omitted. 


Theore m [12] states that, under modest assumptions. Al¬ 
gorithm 111 yields perfect motif detection and classifica¬ 
tion at every level in the hierarchy. From a technical 
viewpoint, this theorem relies on a 2 —> oo norm bound 
on the residuals of X about A' (see 15 1 , which is crucial 
to the perfect recovery of precisely the Rg large-scale 
subraphs. This bound, in turn, only guarantees this 
perfect recovery of when the average degree is at least 
of order v / nlog 2 (n). We surmise that for subsequent 
inference tasks that are more robust to the identification 
of the large-scale subgraphs, results can be established 
in sparser regimes. 


Morever, when applying this procedure to graphs which 
violate our HSBM model assumptions (for example, 
when applying the procedure to real data), we encounter 
error propagation inherent to recursive procedures. In 
Algorithm [lj there are three main sources of error prop¬ 
agation: errorful clusterings; the effect of these errorfully- 
inferred subgraphs on S; and subsequent clustering and 
analysis within these errorful subgraphs. We briefly ad¬ 
dress these three error sources below. 


First, finite-sample clustering is inherently errorful and 
misclustered vertices contribute to degradation of power 
in the motif detection test statistic. While we prove the 
asymptotic consistency of our clustering procedure in 
Lemma [6j there are a plethora of other graph cluster¬ 
ing procedures we might employ in the small-sample 
setting, including modularity-based methods such as 
Louvain il and fastgreedy m, and random walk- 
based methods such as walktrap [10]. Understanding 
the impact that the particular clustering procedure has 
on subsequent motif detection is crucial, as is charac¬ 
terizing the common properties of misclustered vertices; 
e.g., in a stochastic block model, are misclustered vertices 
overwhelmingly likely to be low-degree? 

Second, although testing based on T is asymptotically 
robust to a modest number of misclustered vertices, 
namely o(max, mr(i)) vertices, the finite-sample robust¬ 
ness of this test statistic remains open. Lastly, we need 
to understand the robustness properties of further clus¬ 
tering these errorfully observed motifs. In ll48l . the au¬ 
thors propose a model for errorfully observed random 
graphs, and study the subsequent impact of the graph 
error on vertex classification. Adapting their model and 
methodology to the framework of spectral clustering will 
be essential for understanding the robustness properties 
of our algorithm, and is the subject of present research. 


5 Experiments 


social network. 


5.1 Motif detection in the Drosophila Connectome 

The cortical column conjecture suggests that neurons are 
connected in a graph which exhibits motifs representing 
repeated processing modules. (Note that we understand 
that there is controversy surrounding the definition and 
even the existence of "cortical columns"; our consider¬ 
ation includes "generic" recurring circuit motifs, and is 
not limited to the canonical Mountcastle-style column 
a.) While the full cortical connectome necessary to 
rigorously test this conjecture is not yet available even 
on the scale of fly brains, in the authors were able 
to construct a portion of the Drosophila fly medulla 
connectome which exhibits columnar structure. 

This graph is constructed by first constructing the full 
connectome between 379 named neurons (believed to be 
a single column) and then sparsely reconstructing the 
connectome between and within surrounding columns 
via a semi-automated procedure. The resulting connec- 
tomc]^] has 1748 vertices in its largest connected com¬ 
ponent, the adjacency matrix of which is visualized in 
the upper left of Figure [5] We visualize our Algorithm 
[T] run on this graph in Figure [5] First we embed the 
graph into R 13 (13 chosen according the the singular 
value thresholding method applied to a partial SCREE 
plot; see Remark [3jl and, to alleviate sparsity concerns, 
project the embedding onto the sphere. The resulting 
points are then clustered into R = 8 clusters (R chosen 
as in Remark [ 3 } of sizes \V(Hi)\ = 176, \V(H 2 )\ = 
237, \V(H 3 )\ = 434, \V(H 4 )\ = 237, \V(H 5 )\ = 142, 
\V(H 6 )\ = 237, \V(H 7 )\ = 115, and \V(H S )\ = 170 
vertices. These clusters are displayed in the upper right 
of Figure [ 5 ] We then compute the corresponding S 
matrix after re-embedding each of these clusters (bottom 
of Figure [ 5 J. In the heat map representation of S, the 
similarity of the communities is represented on the spec¬ 
trum between white and red, with white representing 
highly similar communities and red representing highly 
dissimilar communities. For example, the bootstrapped 
p-value (from 200 bootstrap samples) associated with 
T(H 6 ,H 8 ) is 0.195, with T(H 2 ,Hq) is 0.02 and with 
T{H e , Hi) is 0.005. 

We next apply hierarchical clustering to S to uncover the 
repeated motif structure (with the resulting dendrogram 
displayed in Figure |5}. Both methods uncovered two 
repeated motifs, the first consisting of subgraphs 1 and 
4 and the second consisting of subgraphs 2, 6, and 8. 
Note that the hierarchical clustering also reveals 2nd 
level motif repetition within the second motif given by 
{6,8}. Indeed, our method uncovers repeated hierarchi¬ 
cal structure in this connectome, and we are presently 


We next apply our algorithm to two real data networks: 2 availab i e from the ope n connectome project http://openconnecto. 

the Drosophila connectome from j6j and the Friendster me/gra P h-services/download/ (see fly) 
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Fig. 5: Visualization of our method applied to the Drosophila connectome. We show the adjacency matrix (upper 
left), the clustering derived via ASE, projection to the sphere and clustering via Algorithm^ and lastly S calculated 
from these clusters. Clustering the subgraphs based on this S suggests two repeated motifs: {1,4} and {2,6,8}. 
Note that the hierarchical clustering also reveals 2nd level motif repetition within the second motif given by { 6 , 8 }. 


working with neurobiologists to determine the biological 
significance of our clusters and motifs. 


5.2 Motif detection in the Friendster network 

We next apply our methodology to analyze and clas¬ 
sify communities in the Friendster social network. The 
Friendster social network contains roughly 60 million 
users and 2 billion connections/edges. In addition, there 
are roughly 1 million communities at the local scale. 
Because we expect the social interactions in these com¬ 
munities to inform the function of the different com¬ 
munities, we expect to observe distributional repetition 
among the graphs associated with these communities. 

Implementing Algorithm [l] on the very large Friendster 
graph presents computational challenges. To overcome 
this challenge in scalability, we use the specialized SSD- 
based graph processing engine FlashGraph [49], which 
is designed to analyze graphs with billions of nodes. 
With FlashGraph, we adjacency spectral embed the 
Friendster adjacency matrix into R 1 1 —where D = 14 is 
chosen using singular value thresholding on the partial 
SCREE plot (see Remark |3j. Using the model selection 


methodology outlined in Remark [3] we find the best 
coarse-grained clustering of the graph is achieved with 
R = 15 large-scale clusters ranging in size from 10 6 
to 15.6 million vertices (note that to alleviate sparsity 
concerns, we projected the embedding onto the sphere 
before clustering). After re-embedding the induced sub¬ 
graphs associated with these 15 clusters, we use a linear 
time estimate of the test statistic T to compute S, the ma¬ 
trix of estimated pairwise dissimilarities among the sub¬ 
graphs. See Figure[ 6 ]for a heat map depicting S £ M 15x15 . 
In the heat map, the similarity of the communities is 
represented on the spectrum between white and red, 
with white representing highly similar communities and 
red representing highly dissimilar communities. From 
the figure, we can see clear repetition in the subgraph 
distributions; for example, we see a repeated motif in¬ 
cluding subgraphs {IT 5 , H^, H^ 1 H 2 ] and a clear motif 
including subgraphs {fT 10 ,-ffi 2 , h 9 }. 

Formalizing the motif detection step, we next employ 
hierarchical clustering to cluster S into motifs; see Figure 
[ 6 ] for the corresponding hierarchical clustering dendro¬ 
gram, which suggests that our algorithm does in fact 
uncover repeated motif structure at the coarse-grained 
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Fig. 6: Heat map depiction of the level one Friendster 
estimated dissimilarity matrix S £ R 15x15 . In the heat 
map, the similarity of the communities is represented on 
the spectrum between white and red, with white repre¬ 
senting highly similar communities and red representing 
highly dissimilar communities. In addition, we cluster S 
using hierarchical clustering and display the associated 
hierarchical clustering dendrogram. 


level in the Friendster graph. While it may be difficult to 
draw meaningful inference from repeated motifs at the 
scale of hundreds of thousands to millions of vertices, 
if these motifs are capturing a common HSBM structure 
within the subgraphs in the motif, then we can employ 
our algorithm recursively on each motif to tease out 
further hierarchical structure. 

Exploring this further, we consider three subgraphs 
{H' 2 , H 8 , Hi 5 }, two of which are in the same motif (8 
and 15) and both differing significantly from subgraph 2 
according to S. We embed these subgraphs into R 2fi (26 
chosen as outlined in Remark [3jl, perform a Procrustes 
alignment of the vertex sets of the three subgraphs, 
cluster each into 4 clusters (4 chosen to optimize silhou¬ 
ette width in A:-means clustering), and estimate both the 
block connection probability matrices. 
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and the block membership probabilities 7T2, irg, 7ri5, for 
each of the three graphs. We calculate 

||A-A||f = 0.033; 

II A- A 5 ||f = 0.027; 

||A-A 5 ||f = 0.0058; 

||tt 2 - tr 8 || = 0.043; 

IIft2 - ffi5II = 0.043; 

lifts - fr 15 || = 0.0010; 

which suggests that the repeated structure our algorithm 
uncovers is SBM substructure, thus ensuring that we 
can proceed to apply our algorithm recursively to the 
subsequent motifs. 

As a final point, we recursively apply Algorithm [llto the 
subgraph Hu . We first embed the graph into M 2 °(again, 
with 26 chosen as outlined in Remark [3j. Next, using 
the model selection methodology outlined in Remark [3j 
we cluster the vertices into R = 13 large-scale clusters 
of sizes ranging from 500K to 2.7M vertices. We then 
use a linear time estimate of the test statistic T to 
compute S (see Figure [?]|, and note that there appear 
to be clear repeated motifs (for example, subgraphs 8 
and 12) among the H's. We run hierarchical clustering 
to cluster the 13 subgraphs, and note that the associated 
dendrogram—as shown in Figure [tJ— shows that our al¬ 
gorithm again uncovered some repeated level-2 structure 
in the Friendster network. We can, of course, recursively 
apply our algorithm still further to tease out the motif 
structure at increasingly fine-grained scale. 

Ideally, when recursively running Algorithm [lj we 
would like to simultaneously embed and cluster all 
subgraphs in the motif. In addition to potentially re¬ 
ducing embedding variance, being able to efficiently 
simultaneously embed all the subgraphs in a motif could 
greatly increase algorithmic scalability in large networks 
with a very large number of communities at local-scale. 
In order to do this, we need to understand the nature of 
the repeated structure within the motifs. This repeated 
structure can inform an estimation of a motif average 
(an averaging of the subgraphs within the motif), which 
can then be embedded into an appropriate Euclidean 
space in lieu of embedding all of the subgraphs in 
the motif separately. However, this averaging presents 
several novel challenges, as these subgraphs may be of 
very different orders and may be errorfully obtained, 
which could lead to compounded errors in the averaging 
step. We are presently working to determine a robust 
averaging procedure (or a simultaneous embedding pro¬ 
cedure akin to JOFC l50l ) which exploits the common 
structure within the motifs. 

6 Conclusion 

In summary, we provide an algorithm for community 
detection and classification for hierarchical stochastic 
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Fig. 7: Heat map depiction of the level two Friendster estimated dissimilarity matrix S £ R 13x13 of Hu. In the 
heat map, the similarity of the communities is represented on the spectrum between white and red, with white 
representing highly similar communities and red representing highly dissimilar communities. In addition, we cluster 
S using hierarchical clustering and display the associated hierarchical clustering dendrogram. 


blockmodels. Our algorithm depends on a consistent 
lower-dimensional embedding of the graph, followed 
by a valid and asymptotically powerful nonparametric 
test procedure for the determination of distributionally 
equivalent subgraphs known as motifs. In the case of 
a two-level hierarchical stochastic block model, we es¬ 
tablish theoretical guarantees on the consistency of our 
estimates for the induced subgraphs and the validity of 
our subsequent tests. 

While the hierarchical stochastic block model is a very 
particular random graph model, the hierarchical na¬ 
ture of the HSBM—that of smaller subgraphs that are 
densely connected within and somewhat loosely con¬ 
nected across—is a central feature of many networks. 
Because our results are situated primarily in the context 
of random dot product graphs, and because random 
dot product graphs can be used to closely approximate 
many independent edge graphs (5Tj, we believe that our 
algorithm can be successfully adapted for the determina¬ 
tion of multiscale structure in significantly more intricate 
models. 

By performing community detection and classification 
on the Drosophila connectome and on the social network 
Friendster, we demonstrate that our algorithm can be 
feasibly deployed on real (and, in the case of Friend¬ 
ster, large!) graphs. We leverage state-of-the-art software 


packages FlashGraph and igraph to substantially re¬ 
duce computation time. In both graphs, our algorithm 
detects and classifies multiple similar communities. Of 
considerable interest and ongoing research is the analysis 
of the functional or structural features of these distinct 
communities. Because our algorithm can be applied 
recursively to uncover finer-grained structure, we are 
hopeful that these methods can contribute to a deeper 
understanding of the implications of statistical subgraph 
similarity on the structure and function of social and 
biological networks. 


7 Acknowledgments 

The authors thank Zheng Da, Disa Mhembere, and 
Randal Burns for assistance in processing the Friendster 
graph using FlashGraph Il49l , Gabor Csardi and Edo 
Airoldi for assistance in implementing our algorithm 
in igraph I52l . Daniel L. Sussman for helpful discus¬ 
sions in formulating the HSBM framework, and R. Jacob 
Vogelstein and Joshua T. Vogelstein for suggesting this 
line of research. This work is partially supported by the 
XDATA & GRAPHS programs of the Defense Advanced 
Research Projects Agency (DARPA). 
























14 


References 

[1] S. Wasserman and K. Faust, Social Network Analysis: 
Methods and Applications. Cambridge University 
Press, 1994. 

[2] E. Bullmore and O. Sporns, "Complex brain net¬ 
works: Graph theoretical analysis of structural and 
functional systems," Nature Rev. Neurosci, vol. 10, 
pp. 186-198, 2009. 

[3] D. J. de Solla Price, "Networks of scientific papers," 
Science, vol. 149, pp. 510-515, 1965. 

[4] V. B. Mountcastle, "The columnar organization of 
the neocortex." Brain, vol. 120, no. 4, pp. 701-722, 
1997. 

[5] G. Marcus, A. Marblestone, and T. Dean, "The 
atoms of neural computation," Science, vol. 346, no. 
6209, pp. 551-552, 2014. 

[6] S. Takemura, A. Bharioke, Z. Lu, A. Nern, S. Vita- 
ladevuni, P. K. Rivlin, W. T. Katz, D. J. Olbris, S. M. 
Plaza, P. Winston, T. Zhao, J. A. Horne, R. D. Fetter, 
S. Takemura, K. Blazek, L.-A. Chang, O. Ogundeyi, 
M. A. Saunders, V. Shapiro, C. Sigmund, G. M. 
Rubin, L. K. Scheffer, I. A. Meinertzhagen, and 

D. B. Chklovskii, "A visual motion detection circuit 
suggested by drosophila connectomics," Nature, vol. 
500, no. 7461, pp. 175-181, 2013. 

[7] P. J. Bickel and A. Chen, "A nonparametric view 
of network models and Newman-Girvan and other 
modularities." Proceedings of the National Academy of 
Sciences of the United States of America, vol. 106, pp. 
21068-73, 2009. 

[8] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and 

E. Lefebvre, "Fast unfolding of communities in large 
networks," Journal of Statistical Mechanics: Theory and 
Experiment, 2008. 

[9] M. Newman and M. Girvan, "Finding and evalu¬ 
ating community structure in networks," Physical 
Review, vol. 69, pp. 1-15, Feb. 2004. 

[10] P. Pons and M. Latapy, "Computing communities in 
large networks using random walks," in Proceedings 
of the 20th international conference on Computer and 
Information Sciences, 2005, pp. 284—293. 

[11] M. Rosvall and C. T. Bergstrom, "Maps of ran¬ 
dom walks on complex networks reveal community 
structure," Proceedings of the National Academy of 
Sciences of the United States of America, vol. 105, 2008. 

[12] F. McSherry, "Spectral partitioning of random 
graphs," in Proceedings of the 42nd IEEE Symposium 
on Foundations of Computer Science, 2001, pp. 529- 
537. 

[13] K. Rohe, S. Chatterjee, and B. Yu, "Spectral clus¬ 
tering and the high-dimensional stochastic block- 
model," Annals of Statistics, vol. 39, pp. 1878-1915, 
2011 . 

[14] D. L. Sussman, M. Tang, D. E. Fishkind, and C. E. 
Priebe, "A consistent adjacency spectral embedding 
for stochastic blockmodel graphs," Journal of the 
American Statistical Association, vol. 107, pp. 1119— 


1128, 2012. 

[15] U. V. Luxburg, "A tutorial on spectral clustering," 
Statistics and computing, vol. 17, pp. 395-416, 2007. 

[16] T. Qin and K. Rohe, "Regularized spectral clustering 
under the degree-corrected stochastic blockmodel," 
Advances in Neural Information Processing Systems, 
2013. 

[17] K. Chaudhuri, F. Chung, and A. Tsiatas, "Spectral 
partitioning of graphs with general degrees and the 
extended planted partition model," in Proceedings of 
the 25th conference on learning theory, 2012. 

[18] V. Lyzinski, D. L. Sussman, M. Tang, A. Athreya, 
and C. E. Priebe, "Perfect clustering for stochastic 
blockmodel graphs via adjacency spectral embed¬ 
ding," Electronic Journal of Statistics, vol. 8, pp. 2905- 
2922, 2014. 

[19] E. Mossel, J. Neeman, and A. Sly, "Stochastic block 
models and reconstruction," Probability Theory and 
Related Fields, In press. 

[20] B. Hajek, Y. Wu, and J. Xu, "Acheiving exact cluster 
recovery threshold via semidefinite programming," 
IEEE Transactions on Information Theory, vol. 62, pp. 
2788-2797, 2016. 

[21] C. Gao, Z. Ma, A. Y. Zhang, and H. H. Zhou, 
"Achieving optimal misclassification proporition in 
stochastic blockmodel," 2015, arXiv preprint at http: 
//ar xiv.org/abs/1505.03722 

[22] J. Lei and A. Rinaldo, "Consistency of spectral clus¬ 
tering in stochastic blockmodels," Annals of Statis¬ 
tics, vol. 43, pp. 215-237, 2015. 

[23] H. Pao, G. A. Coppersmith, and C. E. Priebe, "Sta¬ 
tistical inference on random graphs: Comparative 
power analyses via Monte Carlo," Journal of Compu¬ 
tational and Graphical Statistics, vol. 20, pp. 395-416, 
2011 . 

[24] A. Rukhin and C. E. Priebe, "A comparative power 
analysis of the maximum degree and size invariants 
for random graph inference," Journal of Statistical 
Planning and Inference, vol. 141, pp. 1041-1046, 2011. 

[25] D. Koutra, J. T. Vogelstein, and C. Faloutsos, "Delta- 
Con: A principled massive-graph similarity func¬ 
tion," in Proceedings of the SIAM International Con¬ 
ference in Data Mining, 2013, pp. 162-170. 

[26] M. Rosvall and C. T. Bergstrom, "Mapping change 
in large networks," PLOS ONE, vol. 5, 2010. 

[27] D. Asta and C. R. Shalizi, "Geometric network com¬ 
parison," 2014, arXiv preprint, http://arxiv.org/ 
abs/1411.1350 

[28] M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, 
and C. E. Priebe, "A nonparametric two-sample hy¬ 
pothesis for random dot product graphs," Bernoulli, 
2016, in press. 

[29] M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, 
Y. Park, and C. E. Priebe, "A semiparametric two- 
sample hypothesis testing for random dot prod¬ 
uct graphs," Journal of Computational and Graphical 
Statistics, 2016, in press. 

[30] Y. Wang and G. Wong, "Stochastic blockmodels for 




15 


directed graphs," Journal of the American Statistical 
Association, vol. 82, pp. 8-19, 1987. 

[31] P. W. Holland, K. Laskey, and S. Leinhardt, 
"Stochastic blockmodels: First steps," Social Net¬ 
works, vol. 5, pp. 109-137, 1983. 

[32] T. P. Peixoto, "Hierarchical block structures and 
high-resolution model selection in large networks," 
Physical Reviezv X, vol. 4, no. 011047, pp. 1-18, 2014. 

[33] C. L. M. Nickel, "Random dot product graphs: 
A model for social networks," Ph.D. dissertation, 
Johns Hopkins University, 2006. 

[34] P. D. Hoff, A. E. Raftery, and M. S. Handcock, "La¬ 
tent space approaches to social network analysis," 
Journal of the American Statistical Association, vol. 97, 
pp. 1090-1098, 2002. 

[35] A. Clauset, M. E. J. Newman, and C. Moore, "Find¬ 
ing community structure in very large networks," 
Physical Reviezv E, vol. 70, 2004. 

[36] M. Mariadassou, S. Robin, and C. Vacher, "Uncover¬ 
ing latent structure in valued graphs: A variational 
approach," Annals of Applied Statistics, vol. 4, pp. 
715-742, 2010. 

[37] M. Sales-Pardo, R. Guimera, A. A. Moreira, and 
L. A. N. Amaral, "Extracting the hierarchical or¬ 
ganization of complex systems," Proceedings of the 
National Academy of Sciences of the United States of 
America, vol. 104, 2007. 

[38] J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Falout- 
sos, and Z. Ghahramani, "Kronecker graphs: An 
approach to modeling networks," Journal of Machine 
Learning Research, vol. 11, pp. 985-1042, 2010. 

[39] Y. Park, C. Moore, and J. S. Bader, "Dynamic net¬ 
works from hierarchical Bayesian graph clustering," 
PLOS ONE, vol. 5, 2010. 

[40] Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann, "Link com¬ 
munities reveal multiscale complexity in networks," 
Nature, vol. 466, 2010. 

[41] D. L. Sussman, M. Tang, and C. E. Priebe, "Consis¬ 
tent latent position estimation and vertex classifica¬ 
tion for random dot product graphs," IEEE Trans¬ 
actions on Pattern Analysis and Machine Intelligence, 
vol. 36, pp. 48-57, 2014. 

[42] R. Vidal, "A tutorial on subspace clustering," IEEE 
Signal Processing Magazine, vol. 28, pp. 52-68, 2010. 

[43] S. Chatterjee, "Matrix estimation by universal sin¬ 
gular value thresholding," The Annals of Statistics, 
vol. 43, pp. 177-214, 2014. 

[44] L. Kaufman and P J. Rousseeuw, Finding Groups in 
Data: An Introduction to Cluster Analysis. John Wiley 
& Sons, 2009, vol. 344. 

[45] M. Zhu and A. Ghodsi, "Automatic dimensionality 
selection from the scree plot via the use of profile 
likelihood," Computational Statistics & Data Analysis, 
vol. 51, pp. 918-930, 2006. 

[46] V. Lyzinski, D. L. Sussman, D. E. Fishkind, H. Pao, 
L. Chen, J. T. Vogelstein, Y. Park, and C. E. Priebe, 
"Spectral clustering for divide-and-conquer graph 
matching," Parallel Computing, vol. 47, pp. 70-87, 


2015. 

[47] A. Clauset, C. Moore, and M. E. J. Newman, "Hi¬ 
erarchical structure and the prediction of missing 
links in networks," Nature, vol. 453, pp. 98-101, 
2008. 

[48] C. E. Priebe, D. L. Sussman, M. Tang, and J. T. 
Vogelstein, "Statistical inference on errorfully ob¬ 
served graphs," Journal of Computational and Graph¬ 
ical Statistics, vol. 24, 2015. 

[49] D. Zheng, D. Mhembere, R. Burns, J. T. Vogelstein, 
C. E. Priebe, and A. S. Szalay, "Flashgraph: Process¬ 
ing billion-node graphs on an array of commodity 
SSDs," in 13th USENIX Conference on File and Storage 
Technologies (FAST 15), 2015. 

[50] C. E. Priebe, D. J. Marchette, Z. Ma, and S. Adali, 
"Manifold matching: Joint optimization of fidelity 
and commensurability," Brazilian Journal of Probabil¬ 
ity and Statistics, vol. 27, pp. 377-400, 2013. 

[51] M. Tang, D. L. Sussman, and C. E. Priebe, "Uni¬ 
versally consistent vertex classification for latent 
position graphs," Annals of Statistics, vol. 31, pp. 
1406-1430, 2013. 

[52] G. Csardi and T. Nepusz, "The igraph software 
package for complex network research," Interjour- 
nal, Complex Systems, vol. 1695, no. 5, pp. 1-9, 2006. 

[53] G. W. Stewart and J.-G. Sun, Matrix pertubation 
theory. Academic Press, 1990. 

[54] L. Lu and X. Peng, "Spectra of edge-independent 
random graphs," Electronic Journal of Combinatorics, 
vol. 20, 2013. 



16 


Appendix 

We now provide proofs of Theorem [5] and Lemma |6j 
We will state and prove Theorem [5] in slightly greater 
generality here, by first introducing the notion of a 
random dot product graph with a given sparsity factor 
Pn • 

Definition 13 (The d-dimensional random dot product 
graph with sparsity factor p n ). Let F be a distribution 
on a set X C satisfying x T y £ [0,1] for all x, y £ X. 
We say (X, A) ~ RDPG(F) with sparsity factor p n < I 
if the following hold. Let X-\,.... X n ~F be independent 
random variables and define 

X = [Xl | • • • | X/ £ M. nxd and P = p„XX T £ [0, l] nxn 

( 10 ) 

As before, the X, are the latent positions for the random 
graph. The matrix A £ {0,l} raxn is defined to be a sym¬ 
metric, hollow matrix such that for all i < j, conditioned 
on X,, Xj the A %1 are independent and 

Aij ~ Bernoulli(p n Xj T Xj), (11) 

namely, 

Pr[A I X] = n (PnXjXj ) A «(1 - p n XjX 3 )^- a ^ (12) 

i<j 


Recall that we denote the second moment matrix for the 
X,; by A = E(XiX 1 r ), and we assume that A is of rank 
d. 


Definition 14 (Embedding of A and P). Suppose that A 
is as in Definition [hi] Then our estimate for the pi/ 2 X 
(up to rotation) is X = UaS]/ 2 , where Sa £ R dxd is 
the diagonal submatrix with the d largest eigenvalues (in 
magnitude) of \A\ and Ua £ ST xd is the matrix whose or¬ 
thonormal columns are the corresponding eigenvectors. 
Similarly, we let UpSpUf denote the spectral decompo¬ 
sition of P. Note that P is of rank d. 


Theorem [5] follows as an easy consequence of the more 
general Theorem [l5j which we state below. 

Theorem 15. Let (A,X) ~ RDPG(P) with rank d second 
moment matrix and sparsity factor p n . Let E n he the event 
that there exists a rotation matrix W £ R dxd such that 


max ||X, ; - pl/ 2 WXi\\ < 


Cdf' 2 log 2 n 
y/np„ 


where C > 0 is some fixed constant. Then E n occurs 
asymptotically almost surely. 


Proof of Theorem Ff5l 


The proof of Theorem [_ 
of supporting results. We note that Theorem [18 


will follow from a succession 
which 


deals with the accuracy of spectral embedding estimates 
in Frobenius norm, may be of independent interest. In 


what follows, for a matrix A £ K mxm , ||A|| will denote 
the spectral norm of A. 

We begin with a short proposition. 

Proposition 16. Let (A, X) ~ RDPG(P) zoith sparsity 
factor p n . Let WiT,Wj be the singidar value decomposition 
ofUjU A . Then asymptotically almost surely, 

|| UjU A ~ W,wj || F = 0((np ra ) _1 ) 

Proof: Let <ti, a- 2 , ■ ■ ■, era denote the singular values 
of UJ,Ua (the diagonal entries of X). Then er.j = cos {Of) 
where the 9i are the principal angles between the sub¬ 
spaces spanned by Ua and Up. Furthermore, by the 
Davis-Kahan sin(0) theorem (see e.g.. Theorem 3.6 in 

El), 

II UaUJ - UpU T p || = max | sin(^)| < 


for sufficiently large n. Recall here A d(P) denotes the d- 
th largest eigenvalue of P. The spectral norm bound for 
A — P from Theorem 6 in ||54| then gives 

\\U A Uj - UpUj || < = o{(np n )~ 1/2 ). 

np n 


We thus have 


\UjU A - WrW ?|| F = ||S - /||f = 


\| »=l 


< - Ed “ 


2—1 

d 


2=1 


= X sin 2 1 0 *) 


i-1 


as desired. 


< d\\U A Uf - UpUf || 2 = O^ripn)- 1 ) 


□ 


Denote by W* the orthogonal matrix l-L'i IL^ 1 as defined 
in the above proposition. We now establish the following 
key lemma. The lemma allows us to exchange the order 
of the orthogonal transformation W* and the diagonal 
scaling transformation Sa or Sp. 

Lemma 17. Let (A, X) ~ RDPG(F) with sparsity factor 
p n . Then asymptotically almost surely, 

|| W*S A ~ S P W*\\ F = O(logn) 


and 

II W*S x l 2 - S^W *|| F = 0(logn(n/5 n ) -1 ^ 2 ) 


Proof: Let R = Ua — UpU]>Ua■ We note that R is 
the residual after projecting Ua orthogonally onto the 
column space of Up, and note 

\\U A - UpUJUa\\f = O^npr,)- 1 / 2 ). 
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We derive that 

W*S A = ( W * - UjU A )S A + UJUaSa 
= (W* - UjU A )S A + UJAUa 
= (W* - UjU A )S A + Uj(A- P)Ua + UJPUa 
= (r - UJUa)S a + UJ(A- P)R 
+ Uj(A- P)UpUjU A + UJPUa 
= (W* - UjU A )S A + Uj(A- P)R 
+ Uj (A - P)UpUjUA + SpUjUA 


We next present Theorem 18 which extends earlier re¬ 
sults on Frobenius norm accuracy of the adjacency spec¬ 
tral embedding from fl8l even when the second moment 
matrix E[.Y-| A"-, 1 ] does not have distinct eigenvalues. 


Theorem 18. Let (A, A') ~ RDPG(P) with sparsity factor 
p n . Let E n be the event that there exists a rotation matrix W 
such that 


\\X-pI/ 2 XW\\ f 

= \\{A - P)UpSf 1/2 \\p + 0(log(n)(np„) -1 / 2 ) 


Writing SpUjU A = Sp(UjU A — W*) + SpW* and 

rearranging terms, we obtain 

|| W*S A - S P W*\\ F <\\W* - UJUa\\ f (\\S a \\ + ||5p||) 

+ \\Up(A- P)R\\ f 
+ \\U^(A-P)U p UJU a \\ f 
< 0 ( 1 ) + 0 ( 1 ) 

+ Wp (A — P)Up\\ F \\UpU A \\ 

asymptotically almost surely. Now, \\U F Ua\\ < 1. Hence 
we can focus on the term Uj, {A — P)Up, which is a dx d 
matrix whose y-th entry is of the form 

n n 

uj {A - P)uj = ^2 ^2(A k i - Pki)u ik Uji 
k= 1 1=1 

= 2 ^2 (Akl — Pkl)Ui k Ujl — ^2 PkkUikUjk 

k,l:k<l k 

where Ui and Uj are the i-th and j-th columns of Up. 
Thus, conditioned on P , uj ( A — P)uj is a sum of 
independent mean 0 random variables and a term of 
order 0(1). Now, by Hoeffding's inequality. 


^ 2 (A k i — Pki)u ik Uji > t 

k,l:k<l 

/ _2 1^ 1 \ 

< 2 exp ( —- — - ) < 2exp(—f 2 ). 

K Z2k,l:k<l( 2u ik u jl) ' 

Therefore, each entry of Uf ( A—P)Up is of order O(log n) 
asymptotically almost surely, and as a consequence. 


\\Uj(A-P)Up\\ F 


is of order O(logrc) asymptotically almost surely. Hence, 


\\W*S A - S P W*\\ =0(logn) 

asymptotically almost surely. We establish |lT*iSq' 2 — 
S 1 p 2 W*\\ F = 0(logn(np n ) -1 / 2 ) by noting that the y-th 
entry of W*S A 2 - Sp 2 W* can be written as 


M.-(Ap(.4) - AP(P)) = -fff 

X \ A ) + W ) 

and that the eigenvalues A*/ 2 (A) and AV 2 (P) are all of 
order Ll(^/np n ) (see El). □ 


Then E n occurs asymptotically almost surely. 

Proof: Let 

R 1 = UpUjU A - U P W* 

R 2 = ( W*S 1 J 2 - Sp /2 W*). 

We deduce that 

X - U P Sp /2 W* =u A s][ 2 - UpW*S Y 2 

+ Up(W*S 1 J 2 - Sp /2 W*) 
={U A - UpUJUa)S]1 2 
+ R i^y 2 + UpRo_ 

=U A S][ 2 - UpU-UaST 

+ R\S'J + UaRi 

Now, UpUjP = P and U A S][ 2 = AU A Sf 1/2 . Hence 
X - U P Sp /2 W* =(A - P)U A Sf 1/2 

- UpUf (A — P)U A Sf 1/2 
+ R.S'J 2 + U a R 2 

Writing 

R 3 =U a - U P W* 

=U A ~ UpUJUa+ Ru 

we derive that 

X - U P Sp /2 W* ={A - P)UpW*Sf 1/2 

- U P Uf{A - P)U P W*S~ A 1/2 
+ {I — U P Uj)(A - P)R 3 S~ a 1/2 

+ R\S a + u a r 2 

Now 

||i? 1 ||p=0 ((np n )- 1 ), 

\\R 2 \\f = 0(\ogn(np n y 1/2 ), and 
II-R 3 II f = 0((np n )~ 1/2 ); 

indeed, we recall 

II U A - UpUJUaWf = 0((np n )- 1 / 2 ). 
Furthermore, Hoeffding's inequality guarantees that 

\\U P Uj{A- P)UpW*S~ A 1/2 \\ F 
< || Uj{A - P)t/p||p||^ 1/2 || F = 0(log n{np n )~ 1/2 ) 
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As a consequence, 

\\X -UpSp /2 W*\\ F 

= ||(A - P)U p W*S~ 1/2 \\ f + 0{\ogn{np n )- 1 / 2 ) 
= || {A - P)U P S~ 1/2 W* 

+ (A - P)Up(S~ 1/2 W* - W*S~ a 1/2 )\\f 
+ 0(log n(np ra )~ 1/2 ) 


Using a very similar argument as that employed in the 
proof of Lemma 17 we can show that 

\\S~ 1/2 W* - W*S a 1/2 \\ f = 0(log n(np n )~ 3 / 2 ) 

Recall that 


||((A - P)U P (S~ 1/2 W * - W*S~ 1/2 )\\ f 
< ||(A - P)U P \\\\(S- 1 / 2 W* - W*^ 1/2 )|| F 

Further, as already mentioned. Theorem 6 of Il54l ensures 
that || (A— P) || is of order 0(y/np n ) asymptotically almost 
surely; this implies, of course, identical bounds on ||(A — 
P)Up ||). We conclude that 

\\X -U P Sp /2 W*\\ F 

= ||(A - P)U P Sp 1/2 W*\\ F + 0(log(n)(np n )- 1 / 2 ) 

= ||(A - P)U P Sp 1/2 \\ F + 0(log(n)(np„)- 1/2 ). (13) 

Finally, to complete the proof, we note that 

pl/' 2 X = U P Sp /2 W 

for some orthogonal matrix W. Since W* is also orthog¬ 
onal, we conclude that there exists some orthogonal W 
for which 

p]/ 2 XW = UpSp /2 W*, 

as desired. □ 

We are now ready to prove Theorem [15] 

Proof: To establish Theorem [15} we begin by noting 

that 


||X - p\l 2 XW\\ F = ||(A - P)UpS-p 1,2 \\ F 
+ 0(log(n)(np„) _1/2 ) 


and hence 


max ||Xj - pTWX.W 

i 

< —pjl - ma x || ((A — P)Up)i\\ + 0(\og{n){np n )~ 1/2 ) 

A T(P) i 

rp / 2 

< -j- -max ||(A - P)uj\\oo + 0(\og{n)(np n )~ 1/2 ) 

A T(P) > 

where u :j denotes the j -th column of Up. Now, for a 
given j and a given index i, the z-th element of the vector 
(A — P)uj is of the form 

^ ( A, Pik'j'Ujk 


and once again, by Hoeffding's inequality, the above 
term is O(logn) asymptotically almost surely Taking the 
union bound over all indices i and all columns j of Up, 
we conclude 


/^jl/2 

max ||Xj - p\[ 2 WX i || < . log 2 (n) 

1 A T( p ) 

+ 0(log(n)(np„) _1/2 ) 
Cdf/ 2 log 2 n 


as desired. 


□ 


Proof of Lemma|6] 


Our assumption that p < q, and Theorem [5j gives that 


and 


P ■= max ma x(Q 2) (£,:), £; 2) (h,:)), 

t,h 

q := minmin(C| 2) (f, :),^ 2) (/i,:)), 

% £ : h 


are such that p < q asymptotically almost surely. The 
proof of Lemma[6]follows from the following proposition 
and the fact that p < q asymptotically almost surely. 


Proposition 19. Given the assumptions of Lemma [6] and 
Lemma [5j let E n he the even that the the set S n obtained 
in Algorithm [2] satisfies 


Snn 


= 1 


for all i G [R\. Then E n occurs asymptotically almost surely. 

Proof: For each i £ [R\, define Cj = 

The proposition follows immediately from proving 

(1) For all i £ [n], if X(i ,:) belongs to Cj and |Sj_i (T 
Cj = 0, then X(i ,:) will be added to . 

(2) For all i £ [n], if s £ belongs to Cj and |<Sj_i (T 
Cj\ = 1, then s £ Si (i.e., s will not be removed from 

Si-!). 


For (1), for fixed i £ [n], if X(i ,:) belongs to Cj and 
|«Si_i fl Cj = 0, then 

max {X(i ,:), s) < p. 


By the pigeonhole principle, there exist y,z £ S, ... i such 

that y,z £ Ck for some k £ [U], k f j. Thus (y,z) > q, 

and 

max (X(i,:),s) < max (x.w). 

sGSi-U V ' ' 

and hence X(i, :) will be added to S ,_ i . 

For (2), for fixed i £ [n], suppose s £ Si- 1 belongs to Cj 
and |<Sj_i n Cj\ = 1. Consider two cases. First, suppose 
that for each k £ [U], |5j_i fl C fc | = 1. Then 

max (X(i ,:), s) > q > p > max (x,w), 

s£Si -1 x,w€Si -1 
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and X(i, :) will not be added to Si- 1 , and so s £ Si. 
Otherwise, there exists y,z £ S, i satisfying y, z £ Ck 
for some k £ [R], k 7 ^ j. Therefore 

max (x,s) < p < q < {y, z) < max (x, w), 

x€Si -1 x,w£Si -1 

and even if X(i, :) is added to Si-i, then s will not be 
removed from S^-i, as desired. □ 

To finish the proof of Lemma [ 6 j from Proposition [l9j the 
set S n will contain a single row of l; 1 -" for each j £ [R] 
asymptotically almost surely For each i £ [n], if X(i ,:) £ 
Ck, then asymptotically almost surely 

argmax j (X(t, :),Sj) £ C\, 

as desired. 


